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Abstract In the context of preventing and fighting 
crime, the analysis of mobile phone traffic, among ac- 
tors of a criminal network, is helpful in order to recon- 
struct illegal activities on the base of the relationships 
connecting those specific individuals. Thus, forensic an- 
alysts and investigators require new advanced tools and 
techniques which allow them to manage these data in a 
meaningful and efficient way. In this paper we present 
LogAnalysis, a tool we developed to provide visual data 
representation and filtering, statistical analysis features 
and the possibility of a temporal analysis of mobile 
phone activities. Its adoption may help in unveiling the 
structure of a criminal network and the roles and dy- 
namics of communications among its components. By 
using LogAnalysis, forensic investigators could deeply 
understand hierarchies within criminal organizations, 
for example discovering central members that provide 
connections among different sub-groups, etc. Moreover, 
by analyzing the temporal evolution of the contacts 
among individuals, or by focusing on specific time win- 
dows they could acquire additional insights on the data 
they are analyzing. Finally, we put into evidence how 
the adoption of LogAnalysis may be crucial to solve real 
cases, providing as example a number of case studies in- 
spired by real forensic investigations led by one of the 
authors. 
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1 Introduction 

The increasing usage of mobile phones in the everyday- 
life reflects also in their illicit adoption. For example, 
mobile communication devices are exploited by criminal 
organizations in order to coordinate illegal activities, 
to communicate decisions, etc. In order to prevent and 
fight crime, mobile communication service providers (ac- 
cording to the regulatory legislation of the State in 
which they operate) have to store for a given period all 
the data related to the phone traffic, in the shape of log 
files. These logs contain information about phone calls, 
attempted calls, Short Message Service (SMS), Multi- 
media Messaging Service (MMS), General Packet Radio 
Service (GPRS) and Internet sessions. Additional infor- 
mation could be inferred from traffic produced by Cell 
Global Identities (CGl|^| inside their areas. 

The analysis of reports suppliedby mobile phone service 
providers makes it possible to reconstruct the network 
of relationships among individuals, such as in the con- 
text of criminal organizations. It is possible, in other 
terms, to unveil the existence of criminal networks, some- 
times called rings, identifying actors within the network 
together with their roles. These techniques of forensic 
investigations are well-known, and are rooted in the So- 
cial Network Analysis (SNA). The structure of crimi- 
nal networks could be efficiently formalized by means 
of graphs, whose nodes represent actors of the criminal 

1 CGI is a standard identifier for mobile phones cells which 
provides geographical positioning of mobile phones. 
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organizations (or, in our case, their mobile phones) and 
edges represent the connections among them (i.e., their 
phone communications). The graph representation of 
data extracted from log files is a simple task, while 
its interpretation may result hard, when large volumes 
of data are involved. In fact, it could become difficult 
to find anomalous values and models while browsing 
a large quantity of data. Moreover, visual representa- 
tions of a high number of individuals and connections 
easily become unreadable because of nodes and edges 
each other overlapping. A powerful support comes from 
SNA, which provides methods to evaluate the impor- 
tance of particular individuals within a network and 
relationships among them. For example, SNA provides 
statistical algorithms that find those individuals/nodes 
in key positions and those acting as cohesive elements. 

In this work we present a novel tool we developed, 
named LogAnalysis, for forensic visual statistical anal- 
ysis of mobile phone traffic logs. LogAnalysis graphi- 
cally represents the relationships among mobile phone 
users with a node-link layout. It helps to explore the 
structure of a graph, measuring connectivity among 
users and giving support to visual search and auto- 
matic identification of organizations and groups within 
the network. To this purpose, LogAnalysis integrates 
the graphical representation of networks with metrics 
and measures typical of SNA, in order to help detec- 
tives or forensic analysts to understand the structure 
of criminal associations while highlighting key mem- 
bers inside the criminal ring, and/or those members 
working as link among different associations, and so 
on. Several statistical measures have been implemented 
and made available to the investigators, with a seam- 
less integration with the visual part. An additional fea- 
ture is the possibility of analyzing the temporal evolu- 
tion of the connections among actors of the network, 
for example focusing on particular time windows in or- 
der to obtain additional insights about the dynamics 
of communications before/during/after particular crim- 
inal events. The main features of LogAnalysis are de- 
scribed together with a number of case studies, inspired 
to a real criminal investigation brought by one of the 
authors, successfully solved also by exploiting features 
provided by LogAnalysis. 



2 Related work 

Law enforcement and intelligence agencies frequently 
face the problem of extracting information from large 
amounts of raw data coming from heterogeneous sources, 
among which are phone calls printouts. In recent years, 



a growing number of commercial software has been de- 
veloped that employ analytical techniques of visualiza- 
tion to help investigations. In the following we briefly 
describe, at the best of our knowledge, the most suc- 
cessful among them. 

Analysts Notebook from i2 Incj^j provides a semantic 
graph visualization to assist analysts with investiga- 
tions. Nodes in the graph are entities of semantic data 
types such as persons, events, organizations, bank ac- 
counts, etc. While the system can import text files and 
do automatic layout, its primary application appears 
to be helping analysts in manually creating and refin- 
ing case charts. 

The COPLINK system [6] and the related suite of tools 
has a twofold goal: to ease the extraction of informa- 
tion from police case reports and to analyze criminal 
networks. A conceptual space of entities and objects 
is built exploiting data mining techniques in order to 
help in finding relations between entities. It also pro- 
vides a visualization support consisting of a hyperbolic 
tree view and a spring-embedder graph layout of rele- 
vant entities. Furthermore, COPLINK is able to opti- 
mize the management of information exploited by po- 
lice forces integrating in a unique environment data re- 
garding different cases. This is done in order to enhance 
the possibility of linking data from different criminal in- 
vestigations to get additional insights and to compare 
them in an analytic fashion. 

TRIST [TI5] allows analysts to formulate, refine, orga- 
nize and execute queries over large document collec- 
tions. Its user interface provides different perspectives 
on search results including clustering, trend analysis, 
comparisons, and difference. Information retrieved by 
TRIST then can be loaded into the SANDBOX system 
|35j , an analytical sense-making environment that helps 
to sort, organize, and analyze large amounts of data. 
The system offers interactive visualization techniques 
including gestures for placing, moving, and grouping in- 
formation, as well as templates for building visual mod- 
els of information and visual assessment of evidence. 
Similarly to COPLINK, TRIST is optimized to query 
large databases and to analytically compare results. 

Differently from COPLINK and TRIST, LogAnalysis 
adopt a different approach, which is not based on query- 
ing data but it relies on full visual presentation and 
analysis of such information represented by means of 
network graphs. The strength of our tool is the adop- 
tion of several interactive layout techniques that high- 
light different aspects and features of the considered 



i2 - Analysts Notebook, http://www.i2inc.com/ 
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networks and it allows the inspection of elements (nodes 
and edges) that constitute the network itself. 

Another remarkable tool is GeoTime 20], that visualizes 
the spatial interconnectedness of information over time 
overlaid onto a geographical substrate. It uses an inter- 
active 3D view to visualize and track events, objects, 
and activities both temporally and geo-spatially. One 
difference between GeoTime and LogAnalysis is that the 
feature regarding the spacial dependency of data is not 
yet allowed by our tool, and this makes GeoTime a use- 
ful addition to LogAnalysis for such type of investiga- 
tions. On the other hand, the functionalities provided 
by LogAnalysis in terms of analysis of temporal depen- 
dencies of data improve those provided by GeoTime, as 
highlithed in Section [5~5||5.7| 

As an example of the various general-purpose tools for 
analyzing social networks (differently from tools specifi- 
cally designed to investigate telecom networks) , we men- 
tion NodeXL [32] . an extensible toolkit for network 
overview, discovery and exploration implemented as an 
add-on to the Microsoft Excel 2007/2010 spreadsheet. 
NodeXL is open source and was designed to facilitate 
learning the concepts and methods of Social Network 
Analysis with visualization as a key component. It in- 
tegrates metrics, statistical methods, and visualization 
to gain the benefit of all the three approaches. As for 
the usage of network metrics to assess the importance 
of actors in the network, NodeXL shares a paradigm 
similar to that we adopted in LogAnalysis, although it 
lacks of all the relevant features of our tools related to 
the temporal analysis of the networks. 

Regarding those researches that apply Social Network 
Analysis to relevant topics related to this work, recently 
T. von Landesberger et al. [21] surveyed the available 
techniques for the visual analysis of large graphs. Graph 
visualization techniques are shown and various graph 
algorithmic aspects are discussed, which are useful for 
the different stages of the visual graph analysis process. 
In this work we received a number of challenges pro- 
posed by [H] , trying to address for example the problem 
of large-scale network visualization for ad-hoc problems 
(in our case, to study phone telecom networks). 

Also the analysis of phone call networks has been a 
subject of intensive study. Mellars [22] investigated the 
principal ways a phone call network operates and how 
data are processed. Particular attention has been given 
to the methodology of investigation of data about the 
phone activity that it is possible to collect directly from 
the devices. 



More recently, different works [27,25,26,3] used mobile 
phone call data to examine and characterize the so- 
cial interactions among cell phone users. They analyze 
phone traffic networks consisting of the mobile phone 
call records of million individuals. 

In details, in p51[^6] the authors present the statistical 
features of a large-scale Belgian phone call network con- 
stituted by 4.6 millions users and 7 millions links. That 
study highlights some features typical of large social 
networks [12] that characterize also telecom networks, 
such as the fission in small clusters and the presence of 
strong and weak ties among individuals. In addition, in 
P?] the authors discuss an exceptional feature of that 
network, which is the division in two large communi- 
ties corresponding to two different language users (i.e., 
English and French speakers of the Belgian network). 

The community structure of phone telecom networks 
has been further investigated in [3]. The authors ex- 
ploited an efficient community detection algorithm called 
Louvain method [3JB] to assess the presence of the com- 
munity structure and to study its features, in a large 
phone network of 2.6 millions individuals. 

In conclusion, during the latest years Eagle et al. [101 
[TT] investigated the possibility of inferring a friendship 
social network based on the data from mobile phone 
traffic of the same individuals. This problem attracted 
the attention of other recent studies [4ll33j , particularly 
devoted to understand the dynamics of social connec- 
tions among individuals by means of mobile phone net- 
works. 



2.1 Contribution of this work 

LogAnalysis has been originally presented in a prelim- 
inary version during late 2010 [5] and has received a 
positive critique by the research community of forensic 
analysts and social network analysts. 

We argue that the further developments of this tool 
have increased its potential and performance. In partic- 
ular, the research direction that we are following with 
LogAnalysis is devoted to include the possibility of ana- 
lyzing temporal information from phone call networks, 
and the tool has been specifically optimized to study 
mobile phone telecom networks, whose analysis has at- 
tracted relevant research efforts in the recent period 
[3D]. Additional efforts have been carried out so that 
to improve the possibilities provided by LogAnalysis to 
unveil and study the community structure of the net- 
works, whose importance has been assessed during lat- 
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est years in a number of works |29[I16|. by means of 
different community detection techniques 



Our tool introduces a number of novelties with respect 
to similar platforms existing as to date. In detail, Lo- 
gAnalysis primarily differs from the systems described 
above as we focused on the visual representation of the 
relationships among entities in phone calls. We adopted 
different state-of-the-art view layouts for promoting fast 
exploration and discovery of the analyzed networks. 

Furthermore, our tool provides a system model which 
aims at improving the quality of the analysis of social 
relationships of the network through the integration 
of visualization and SNA-based statistical techniques, 
which is a relevant topic in the ongoing research in So- 
cial Network Analysis [31 . 

To this purpose, LogAnalysis has been assessed as an in- 
valuable support during real investigations carried out 
by professional forensic analysts, in particular in the 
context of analyzing large-scale mobile telecom networks 
exploited for criminal purposes. 

One of the merits of this work, in fact, is to analyze sev- 
eral different real-world use cases inspired by forensic 
investigations carried out by one of the authors. Dur- 
ing these investigations, LogAnalysis has been exploited 
to examine the structural features of criminal phone 
call networks with a systematic methodology adopting 
a unique tool, differently from previous cases in which a 
combination of different SNA-based and digital forensic 
tools had to be adopted to reach similar results. Some 
relevant information about the usage of LogAnalysis in 
the context of real-world investigations have been re- 
ported in this work. As the best of our knowledge, this 
is the first work to present critical information from real 
forensic investigations in mobile phone call networks, 
dealing with real data acquired from actual criminal 
cases. As a relevant fact, we provide with some clues 
that support our claim about the advantages of adopt- 
ing LogAnalysis to unveil possible criminal connections 
among actors of mobile telecom networks. 



3 Analysis of Mobile Phone Traffic Networks 

The relationships established by means of phone calls 
may be explored using different techniques and approaches 
Sometimes, forensic analysis relates to phone traffic made 
by International Mobile Subscriber Identity (IMSI^and 

3 IMSI is a unique number associated with all GSM and 
UMTS network mobile phone users. It is stored in the SIM 
inside the phone and is sent by the phone to the network. 
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Fig. 1 Architecture of LogAnalysis. 



by International Mobile Equipment Identity (IMElJ] 
Detectives generally distinguish three main types of 
analysis of phone traffic logs: i) relational, in order to 
show links (and hence acquaintance) among individual 
users; ii) spatial, helpful to show geographical displace- 
ments of a mobile phone in order to assess location of 
an individual before, during and after a crime has been 
committed and, iii) temporal, useful to discover, for ex- 
ample, at what time a phone call has been made or a 
SMS has been sent, which contacts were involved in a 
phone conversation or how long an Internet connection 
lasted. LogAnalysis provides some tools to investigate 
relational and temporal aspects of phone call networks. 

The architecture of LogAnalysis, shown in Figure [T] is 
designed by extensible levels: i) import of data pro- 
vided by informative systems of mobile phone service 
providers (usually, under the form of textual log files); 
ii) conversion of data to the GraphML^] format, a struc- 
tured XML format, more suitable for graphical repre- 
sentation and portability among several different graph 
drawing applications; iii) visualization and dynamic ex- 
ploration of the obtained mobile phone traffic network. 

An example of the usage of this tool is the research of 
particular elements in the network. To this purpose, 
it is possible to visually discover subsets and gangs 
(or rings), by measuring their cohesion in terms of the 
density of internal connections. Thus, from the overall 
structure of the network are extracted those elements 
of interest for investigations. In fact, some nodes are 
prominent due to their high degree of connection with 
others, other nodes for their strategic position of cen- 
trality in terms of connections, etc. A number of com- 
plete case studies has been analyzed in Section [5j de- 
scribing some features provided by LogAnalysis that 
have been exploited in the context of a real-world in- 
vestigation. 



4 IMEI is a unique 17 or 15 digit code used to identify an 
individual mobile station to a GSM or UMTS network. 

5 http: / /graphml. graphdrawing.org/ 
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3.1 Implementation 

Our system is implemented in Java and integrates sev- 
eral open-source toolkits. In particular, Pre/us^] pro- 
vides the underlying node-link data structures and has 
been used to support the dynamic exploration of net- 
works, according to force-directed and radial models, 
and to identify communities. Jf/A^G^Jhas been used to 
implement some of the SNA ranking algorithms, for the 
computation and visualization of the shortest path(s) 
connecting a pair of nodes, and for the network visual- 
izations and clustering. 



_T IfcilU 


Dcscript ion 


IMEI 


IMEI code MS 


called 


called user 


calling 


calling user 


date/time start 


date/time start calling (GMT) 


date/time end 


date/time end calling (GMT) 


type 


sms, mms, voice, data etc. 


IMS I 


calling or called SIM card 


CGI 


Lat. long. BTS company 



Table 1 An example of the structure of a log file. 

among the standard formats (i.e., fixed width text, de- 
limited, CSV, and more) used by mobile service providers. 



3.2 Data Import 



4 Eyes on some Features of LogAnalysis 



In the context of real-world investigations, mobile phone 
service providers, upon request by judiciary authorities, 
release data logs, normally in textual file format, with 
space or tab separation (CSV format). A typical log file 
contains, at least, the values shown in Table [T] 

Similarly, information about owners of SIM cards, deal- 
ers of SIM cards and operations like activation, deac- 
tivation, number portability are provided by the ser- 
vice providers as additional material in order to ease 
and support the investigation activities. Log file formats 
produced by different companies are heterogeneous. Lo- 
gAnalysis, first of all, parses these files and converts 
data into GraphML format. It is an XML valid and 
well- formed format, containing all nodes and weighted 
edges, each weight representing the frequency of phone 
calls between two adjacent nodes. GraphML has been 
adopted both because of its extensibility and ease of 
import from different SNA toolkits and graph drawing 
utilities. 



3.3 Data Normalization/Cleaning 

Data clean-up usually means the deletion of redundant 
edges and nodes. This step is very importante since 
datasets often contain redundant information, that crowds 
graph visualization and biases statistical measures. In 
these circumstances, redundant edges between the same 
two nodes are collapsed and a coefficient - i.e., a edge 
weight - is attached, which expresses the number of 
calls. Our tool normalizes data after reading and pars- 
ing log files whichever format they have been provided 

6 http://prefuse.org/ 

7 http://jung.sf.net/ 



In this section we put into evidence some of the main 
features of LogAnalysis that have been inspired both 
by forensic analysis and the social network analysis. 
In particular, in Section |4.1| we point out those data 
exploration features provided by our tool. Subsequently, 
in Section [L2] we discuss the role and functioning of a 
set of centrality measures implemented in LogAnalysis 
that can be exploited to assess the importance of actors 
of mobile telecom networks. Furthermore, in Section 4.3 
the layout models adopted in our tool are described, 
focusing on the novelties introduced by LogAnalysis in 
respect to general purpose SNA tools. 



4.1 Data Exploration 

The main goal of LogAnalysis is to support the foren- 
sic detectives into the exploration of data provided by 
mobile phone service providers about the phone traf- 
fic activity of particular individuals of interest for the 
forensic investigations. This support is given by means 
of an interactive visual representation of the phone traf- 
fic network. To this purpose, individuals are identified 
by means of their phones and are represented by nodes 
of a graph. The phone calls, instead, represent the in- 
teractions among actors and for this reason they are 
captured as the edges of the same graph. More formally, 
the structure of phone traffic is described in terms of 
directed graphs G = (V, E) where V is the set of tele- 
phone numbers (nodes) and E is the set of calls (edges) 
among the nodes. The edges, directed and weighted, 
show the direction (incoming or outgoing) and the num- 
ber of phone calls between the various pairs of adjacent 
nodes. 
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LogAnalysis is able to manage phone traffic networks 
up to hundred thousands elements and log hies up to 
millions entries. However, in our experience, a mean- 
ingful interactive visual representation of these data is 
viable analyzing networks up to some thousands of ele- 
ments. To this purpose, in Section[5]we describe a num- 
ber of case studies inspired by real investigations whose 
network includes thousands elements, and in which Lo- 
gAnalysis played a fundamental role in the successful 
conclusion of the investigation. 

In detail, one of the most useful features of LogAnal- 
ysis is that it is able to identify and visually put into 
evidence those actors in the network that play a crucial 
role in the communication dynamics. This is done by 
exploiting the centrality measures provided by the So- 
cial Network Analysis (described in the next section). 
On the other hand, a visual layout only could not be 
sufficient to put into evidence all the required informa- 
tion. For example, different visual representation would 
help detectives to reach additional insights about data, 
the dynamics of the phone traffic network and the ac- 
tivities of the actors of the network. For this reason, 
LogAnalysis provides different interactive visual repre- 
sentations, by adopting several algorithms. 



4.2 Centrality Measures 

LogAnalysis takes into account the concept of central- 
ity measure to highlight actors that cover relevant roles 
inside the analyzed network. Several notions of central- 
ity have been proposed during the latest years in the 
context of Social Network Analysis. 

There are two fundamentally different class of centrality 
measures in communication networks. The first class of 
measures evaluates the centrality of each node/edge in 
a network and is called point centrality measure. The 
second type is called graph centrality measure because 
it assigns a centrality value to the whole network. These 
techniques are particularly suited to study phone traffic 
and criminal networks. 

In detail, in LogAnalysis we adopted four point central- 
ity measures (i.e., degree, betweenness, closeness and 
eigenvector centrality), to inspect the importance of 
each node of the network. 

The set of measures provided in our tool is a selection of 
those provided by Social Network Analysis [33] . It could 
be not sufficient to solve any possible task in phone call 
network analysis. In fact, for particular assignments it 
could yet be necessary to use additional tools in support 



to LogAnalysis and in further evolutions we plan to 
incorporate new centrality measures J9j[l] if necessary. 

For each centrality measure, the tool gives the possibil- 
ity, to rank the nodes/edges of the network according 
to the chosen criterion. Moreover, LogAnalysis allows 
to select those nodes that are central, according to the 
specified ranking, highlighting them and putting into 
evidence their relationships, by exploiting the node-link 
layout techniques (discussed in the following) . This ap- 
proach makes it possible to focus the attention of the 
analysts on specific nodes of interest, putting into evi- 
dence their position and their role inside the network, 
with respect to the others. 

In the following we formally describe the centrality mea- 
sures used in LogAnalysis. 

They represent the centrality as an indicator of the ac- 
tivity of the nodes (degree centrality), of the control 
on other nodes (betweenness centrality), of the prox- 
imity to other nodes (closeness centrality) and of the 
influence of a node (eigenvector centrality). 

4-2.1 Degree centrality 

The degree centrality of a node is defined as the number 
of edges adjacent to this node. For a directed graph 
G = (V, E) with n nodes, we can define the in-degree 
and out-degree centrality measures as 

n i \ d in(v) , . d out (v) 

C D (v)in = ~T , C D (V) out = — (1) 

n — 1 Ji—1 

where di n {v) is the number of incoming edges adjacent 
to the node v, and d ou t{v) is the number of the outgoing 
ones. 

Since a node can at most be adjacent to n — 1 other 
nodes, n — 1 is the normalization factor introduced to 
make the definition independent on the size of the net- 
work and to have < Cd{v) < 1. 

In and out-degree centrality indicates how much activ- 
ity is going on and the most active members. A node 
with a high degree can be seen as a hub, an active nodes 
and an important communication channel. 

We chose to include the degree centrality for a number 
of reasons. First of all, is calculation is computationally 
even on large networks. Furthermore, in the context 
of phone call networks it could be interpreted as the 
chance of a node for catching any information traveling 
through the network. 

Most importantly, in this type of directed networks, 
high values of in-degree are considered a reliable indi- 
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cator of a form of popularity /importance of the given 
node in the network; on the contrary, high values of out- 
degree are interpreted as a form of gregariousness of the 
given actor in respect to the contacted individuals. 

4-2.2 Betweenness centrality 

The communication between two non-adjacent nodes 
might depend on the others, especially on those on the 
paths connecting the two nodes. These intermediate el- 
ements may wield strategic control and influence on 
many others. 

The core issue of this centrality measure is that an actor 
is central if she lies along the shortest paths connecting 
other pairs of nodes. The betweenness centrality of a 
node v can be defined as 

Bc{v) =Y,— (2) 

where o~ st is the number of shortest paths from s to t 
and o~ st (v) is the number of shortest paths from s to t 
that pass through a node v. 

The importance of the betweenness centrality regards 
its capacity of identifying those nodes that vehiculate 
information among different groups of individuals. 

In fact, since its definition due to Freeman [14] the be- 
tweenness centrality has been recognized as a good in- 
dicator to quantify the ability of an actor of the network 
to control the communication between other individu- 
als and, specifically for this reason it has been included 
in LogAnalysis. 

In addition, it has been exploited by Newman [53] to de- 
vise an algorithm to identify communities within a net- 
work. Its adoption in the phone traffic networks is cru- 
cial in order to identify those actors that allow the com- 
munication among different (possibly criminal) groups. 

4-2.3 Closeness centrality 

Another useful centrality measure that has been adopted 
in LogAnalysis is called closeness centrality. The idea is 
that an actor is central if she can quickly interact with 
all the others, not only with her first neighbors p23]. The 
notion of closeness is based on the concept of shortest 
paths (geodesic) d(u, v), the minimum number of edges 
traversed to get from u to v. The closeness centrality of 
the node v is define as 

C C {V) = ^ \, r (3) 



Such a measure is meaningful for connected graphs only, 
assuming that d(u, v) may be equal to a finite value. 

In the context of criminal networks, this measure high- 
lights entities with the minimum distance from the oth- 
ers, allowing them to pass on and receive communica- 
tions more quickly than anyone else in the organization. 
For this reason, the adoption of the closeness central- 
ity is crucial in order to put into evidence inside the 
network, those individuals that are closer to others (in 
terms of phone communications). 

In addition, high values of closeness centrality in such 
type of communication networks are usually regarded as 
an indicator of the ability of the given actor to quickly 
spread information to all other actors of the network. 
For such a reason, the closeness centrality has been se- 
lected to be included in the set of centrality measures 
adopted by LogAnalysis. 

4-2.4 Eigenvector centrality 

Another way to assign the centrality to an actor of the 
network in LogAnalysis is based of the idea that if a 
node has many central neighbors, it should be central as 
well. This measure is called eigenvector centrality and 
establishes that the importance of a node is determined 
by the importance of its neighbors. 

The eigenvector centrality of a given node Vi is 

C E (vi) ex Y, A ^C E {u) (4) 

where Ni is the neighborhood of the given node Vi, and 
x oc Ax that implies Ax — Xx. The centrality corre- 
sponds to the top eigenvector of adjacency matrix A. 

In the context of telecom networks, eigenvector central- 
ity is usually regarded as the measure of influence of 
a given node. High values of eigenvector centrality are 
achieved by actors who are connected with high-scoring 
neighbors, which in turn, inherited such an influence 
from their high-scoring neighbors and so on. 

This measure well reflects an intuitive important fea- 
ture of communication networks that is the influence 
diffusion and for such a reason we decided to include 
the eigenvector centrality in LogAnalysis. 

4-2.5 Clustering coefficient (transitivity) 

The clustering (or transitivity) coefficient of a graph 
measures the degree of interconnectedness of a network 
or, in other words, the tendency of two nodes that are 
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not adjacent but share an acquaintance, to get them- 
selves in contact. High clustering coefficients mean the 
presence of a high number of triangles in the network. 

The local clustering coefficient Ci for a node Uj is the 
number of links among the nodes within its neighbor- 
hood divided by the number of links that could possibly 
exist among them 

Ci = fc l ( { fc 6jfc}l 1) : Vj,v k £ Ni,e jk £ E (5) 

where the neighborhood N of a node Vi is defined as 
Ni — {vj : eij G E A eji £ E}, while fc i (fc l — 1) is 
the number of links that could exist among the nodes 
within the neighborhood. 

In is well-known in the literature |34j that communica- 
tion networks show high values of clustering coefficient 
since they reflect the underlying social structure of con- 
tacts among friends/acquaintances. Moreover, high val- 
ues of local clustering coefficient are considered a reli- 
able indicator of of nodes whose neighbors are very well 
connected and among which a substantial amount of 
information may flow. For such a reason, LogAnalysis 
provides the possibility of computing both the global 
clustering coefficient for any given phone call network 
and the local clustering coefficient of any given node. 



4.3 Layout Algorithms 

In this section we introduce the strategies of interac- 
tive visual representation of the phone traffic networks 
adopted in LogAnalysis. In detail, the graphical repre- 
sentation of phone relationships in LogAnalysis exploits 
features provided by two well-known toolkits, Prefuse 
and JUNG. 

4- 3.1 Force- directed Model 

The main visual representation strategy adopted in Lo- 
gAnalysis is the called force- directed model. It is com- 
puted using the Fruchterman-Reingold algorithm |15j . 
in which nodes repel each other and edges act as springs. 
The consequent displacement of nodes and links shows 
users clustered in groups which can be identified on 
the base of their increase of connectivity. The Barnes- 
Hut algorithm [3j simulates a N-body repulsive sys- 
tem in order to continuously update positions of el- 
ements. Optimization of visualization is interactively 
obtained by modifying parameters relative to the ten- 
sions of springs. Nodes with minor connectivity have 



greater tension, resulting in a displacement of the el- 
ements of a group in orbital position with respect to 
the central group. In LogAnalysis it is possible to mod- 
ify different parameters, for example spring constant of 
force, gravitation force and viscosity/drag of forces. In 
Figure [2] it is possible to appreciate an example of the 
force-directed visualization model. 

4-3.2 Edge Betweenness Clusterer 

We have found that the Fruchterman-Reingold layout in 
conjunction with the Edge Betweenness Clusterer [17] 
allows the interactive discovery of groups (henceforth, 
called clans) existing inside the network and those indi- 
viduals acting as links among groups (hereafter, called 
referents). This feature is crucial because it allows to 
forensic analysts to highlight with low efforts those clans 
whose activity may be suspect inside the phone call net- 
work. Moreover, it leads to additional insights in par- 
ticular regarding the interconnection of these referents 
among each other and among clans. 

More generally, the Edge Betweenness Clusterer, intro- 
duced by Newman |23j , is instrumental in the discovery 
of groups (called communities in Social Network Anal- 
ysis). This algorithm takes into account the weights of 
the edges in the network. In the particular scenario of 
the phone traffic networks, the concept of weights has 
already been defined as the number of phone communi- 
cations among individuals. To highlight the clans, Log- 
Analysis exploits this technique according to a specific 
visualization strategy, called visual aggregation. 

4-3.3 Visual Aggregation 

LogAnalysis adopts two algorithms to detect aggrega- 
tions inside the network which represents the phone 
traffic. The first algorithm called Edge Betweenness Clus- 
terer has been previously introduced. To this purpose, 
instead of regarding the betweenness centrality associ- 
ated to a node, we consider the betweenness centrality 
of an edge, which is defined as the number of shortest 
paths connecting pairs of nodes traversing it. 

In the context of the visual aggregation, once the rank- 
ing of the edges is calculated, the algorithm simulates 
the deletion of those edges with the highest centrality, 
one by one, obtaining the effect of clustering the net- 
work in different groups (i.e., clans) that are weakly 
coupled each other but densely interconnected within 
them. The functioning of this algorithm is based on the 
intuition that edges with high centrality connect groups 
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Fig. 2 Example of the force-directed visualization model. It is possible to put into evidence that groups of nodes repeal each 
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characterized by high interconnectedness among their 
members and low outgoing connections. The Edge Be- 
tweenness Clusterer has been proved to work well in the 
context of social networks. To the best of our knowl- 
edge, this is the first attempt to adopt this strategy to 
identify clans inside phone traffic networks. 

The second algorithm, known as Newman's community 
identification algorithm |23) . is a variant of the hier- 
archical agglomerate clustering (it is also adopted in 
Vizster [H]). Regardless the adopted algorithm, Log- 
Analysis visually presents the identified clans by sur- 
rounding all members with a translucent convex hull 
(see Figure [3]). 

By expanding the action of filtering one can obtain in- 
teresting visualizations. Groups comprised of a single 
node (i.e., monadic clusters) which satisfy the filtering 
condition are compressed and shown as a star. 

Moreover, interactive cluster discovery is available in 
LogAnalysis. Users can suppress an arbitrary number of 
edges to discover strategic groups, together with links 
among groups. Target edges are chosen according to 
the algorithm known as Edge Betweenness Clusterer. 
Labels of nodes belonging to the same cluster bear the 



same color. Different colors identify elements not be- 
longing to a cluster. 

4.3.4 Radial Tree Layout 

The third layout algorithm introduced in LogAnalysis is 
called Radial Tree. It allocates the elements of a graph 
in radial positions and defines several levels upon con- 
centric circles with progressively increasing radii. The 
algorithm developed by Ka Ping Yee et al. [3B] also 
puts nodes in radial positions but gives the possibility 
of varying positions while preserving both orientation 
and order. 

According to that technique, a selected element is placed 
at the center of the canvas and all the other nodes are 
subsequently placed upon concentric circles with radii 
increasing outwards. This visualization strategy is in- 
strumental in the context of the forensic analysis be- 
cause it allows to focus the attention of detectives on a 
suspect, and to have a close look to its connections. 

The interactive visualization by using the Radial Tree 
layout is shown in Figure |4j The interface supports 
filtering and searching elements within the network; 
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the Edge Betweenness Clusterer to divide the network in dif- 
ferent clans on the base of the interactions among members. 



a forensic analyst could select a specific node, which 
is placed at the center of the canvas. Nodes lying on 
the circumference of concentric circles, centered on that 
node, could be also progressively displaced from the se- 
lected one. Moreover, edges are visualized by using dif- 
ferent thickness, calculated with respect to the number 
of calls among the given connected nodes. The over- 
lapping of nodes may be avoided by superimposing a 
force-directed visualization to the radial tree algorithm. 

A useful extension that we implemented, shown in Fig- 
ure [5j is called Radial Exploded layout. Selecting a spe- 
cific node, the analysts can focus on its acquaintances 
that are displayed by using a radial layout. The charac- 
teristic of this exploded strategy is that if focuses only 
on a specific suspect and puts into evidence its links. 



5 Case Study 

5.1 Aim of the Experimentation 

The aim of the current section is twofold: first of all, in 
order to highlight the potential of LogAnalysis and the 
features provided to forensic analysts by the adoption 
of this tool, we discuss more into detail a number of 



examples, including the applicability of some central- 
ity measures discussed above in the assessment of the 
importance of actors in phone call networks, the appli- 
cation of visualization techniques to highlight patterns 
of interactions among individuals in the networks, etc. 

In addition, we underline that LogAnalysis has been 
already adopted by one of the authors during several 
real-world criminal investigations. To this purpose, in 
this section we also report some data about these cases. 
In particular, we provide details regarding several small 
and large case studies (including the details about the 
datasets of phone call networks adopted during the in- 
vestigations) in which LogAnalysis has been adopted to 
obtain additional insights regarding the networks struc- 
ture. Finally, this section is instrumental to introduce 
some additional features, voluntarily not discussed be- 
fore, in order to understand their usage in the context 
of a real investigation. 

We tested our tool against different datasets (reported 
in Table [2| , whose size was comprised between about 4 
thousands mobile communications and 8 millions. 

One important feature we discovered in the criminal 
phone call networks is the growth rate. We found that, 
even though the number of entries in the log files grows, 
the corresponding size of the network grows more slowly. 
In fact, the analysis performed by forensic investigators 
is focused on the study of the network related to indi- 
viduals that are already suspected of being involved in 
criminal activities, or being part of criminal organiza- 
tions (i.e. the clans) or terroristic groups. This reflects 
in a network whose structure grows slowly and com- 
prises a relatively small number of nodes/edges (with 
different weights) with respect to the number of phone 
calls reported by the log files. 

In our real phone call network criminal investigations, 
usually the analyst started with the study of the ego 
networks of the individuals already suspect, those whose 
involvement in the criminal activities has been previ- 
ously proved. The main goal of the analyst was to dis- 
close additional information about the underlying crim- 
inal organization. For example, one important task was 
to put into evidence other individuals, whose activity 
was suspect, in order to hypothesize their complicity 
with actors whose involvement in the criminal organiza- 
tion was ensured. This step was fundamental because, 
by identifying a small number of additional possible 
suspects, it has been possible to proceed with other 
"traditional" investigation methodologies, which would 
be not possible (in terms of time and cost constraints) 
otherwise. 
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Case no. 


1 


2 


3 


4 


5 


6 


7 


8 


No. nodes 


148 


170 


381 


461 


320 


543 


912 


702 


No. edges 


204 


212 


688 


811 


776 


1,229 


2,407 


1,846 


No. log entries 


4,910 


8,447 


125,679 


250,886 


280,466 


589,512 


7,567,119 


8,023,945 



Table 2 Datasets adopted during real forensic investigations using LogAnalysis. We highlight that the number of entries of 
the log files grows at a very different rate with respect to the number of nodes and edges in the network. This is a typical 
feature of the criminal phone call networks. 



5.2 Further Details and Simple Use Case 

In the following we discuss an example use case that 
describes the usage of LogAnalysis during criminal in- 
vestigations. As introduced above, analyzed data repre- 
sent the phone call network of individuals suspected of 
belonging to criminal organizations. The period of anal- 
ysis usually coincides with the commission of certain 
serious crimes. The adoption of our tool is instrumen- 
tal to prove that those criminal facts have been planned 
and committed by the considered suspects. 

Upon request by judiciary authorities, mobile phone 
service providers release all data logs about a certain set 
of suspected actors to the police force. After the import 
of phone call data in LogAnalysis, the process of anal- 
ysis may start with the visualization of the phone call 



network by using the force-directed layout. This is help- 
ful to get a picture of the phone call network and the 
connections of suspected actors among each other and 
with other external individuals (see [2]). Unless the num- 
ber of individuals exceed thousands of actors, which re- 
quires a manual process of filtering, we remark that our 
tool is able to provide with a graphical meaningful vi- 
sualization of the phone call network. One advantage 
of the force-directed layout is the possibility of easily 
identifying clusters of actors within the network. 

In order to improve the visualization, it is possible to 
apply some simple filters. For example, once the foren- 
sic investigator identifies an actor of interest, just click- 
ing on it, LogAnalysis highlights those individuals with 
which the given actor is connected to, and those with 
which it shares the most of the contacts. In that case, 
the number of in-going connections represent the pop- 
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Fig. 5 Example of exploded Radial View layout. A selected node is put into the center and the neighborhood is presented in 
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ularity of a certain actors and the number of out-going 
connections represents its gregariousness. It is easily 
possible to identify who are the individuals with re- 
spect to this actor is a gregarious, and who are those of 
which he/she exercise any influence. 

Double-clicking on a given actor, the layout manager 
exploits the force-directed radial layout (see Figure Eh. 
In such away it is possible, not only to have a picture 
of all the contacts of a given actor, but also to highlight 
the intensity with which those communications occur. 
In addition, it is possible to put into evidence the af- 
filiation of each actor to a given cluster, identified by 
different colors (see Figure [HJ). 

LogAnalysis is particularly suited to assess the presence 
of clusters in the given phone call network and to vi- 
sually put into evidence their structure (see Figure [3]). 
This functionality is helpful to establish the role of a 
given set of actors inside a given group and to under- 
stand the structural and hierarchical organization of a 
possible criminal network. To assess certain hypotheses 
on the hierarchical structure of given criminal network, 
the forensic analyst may exploit the tool depicted in 
Figure |6j that is helpful to have an immediate picture 
of the intensity of the communications among a set of 



actors, highlighting those connections whose relevance 
for the investigation is higher. 



5.3 Overall Metric Tool 

In the following we are going to introduce additional 
features of LogAnalysis that are instrumental in the 
context of real-world criminal investigations. 

An important and useful feature provided by our tool 
discussed in this case study is related to the possibil- 
ity of calculating global quantitative metrics on the 
nodes/edges of the network. In particular, it is possible 
to evaluate some features usually adopted in SNA [25] 
such as: i) overall network metrics (i.e., number of nodes 
and edges, density, diameter); ii) node rankings (i.e., de- 
gree, betweenness, closeness and eigenvector centrality) 
and, finally iii) edge rankings (by means of weights). 

This first step has been helpful for the analyst to gain 
a first insight about the structure of the network, in 
particular putting into evidence individuals whose cen- 
trality values were suspect, with respect to the others. 
Similarly, the same quantitative evaluation puts into ev- 
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Table 3 Overall metrics calculated on the datasets of our four case studies. (*) In LogAnalysis the betweenness centrality is 
not normalized. 



idence those connections (i.e., phone communications) 
that occur more frequently and those actors that are 
more active in the network. In Table [3] we report all the 
metrics calculated on the case studies 1-4. 



5.4 Data Visualization 

Figure [2j [4] and [5] show some details about the Log- 
Analysis user interface. Once imported, data about the 
phone traffic network are visually represented by us- 
ing the default view (i.e., by means of the aforemen- 
tioned force-directed layout). Each node represents a 
cell phone, and edges indicate communications among 
them. On the left, a control panel provides tools and 
filters in order to tune the visualization of the network. 

Using the available dynamic filters, it is possible to hide 
or highlight those nodes (or connections) which satisfy 
specific criteria. Moreover, analysts could interact with 
the graph, for example moving, hiding or emphasizing 



specific elements, in order to dynamically re-arrange the 
structure of the graph. 

The visualization algorithm adopts a weighted repre- 
sentation of edges, drawing those edges with higher 
weights by means of thicker lines. Standard nodes are 
represented by using light-blue as default color. Color 
filters could be defined by users, accordingly to spe- 
cific conditions. For example, in this case study, "light- 
green" nodes reflect the "arrested" condition, "light- 
red" nodes accord to the "sub tree control" filter. 

All these tools are provided in order to produce more 
readable network graphs. It is additionally possible to 
adopt "distance" filters, excluding from the visualiza- 
tion all the nodes far from the selected one more than 
the user-defined threshold. This is particularly helpful 
if the network that is under investigation is very large, 
constituted by more than thousands elements. 

The optimal network visualization is a combination of 
both manual and automatic arrangements. First, it is 
possible to automatically pan and zoom, so as the whole 
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network fits the display area (this particular approach 
may or may not be appropriate, depending on the size 
of the network or on the specific task the analyst would 
like to perform). 

However, the display automatically pans when a new 
node is expanded, centering on the newly expanded net- 
work. Additionally, the tool provides manual panning 
and zooming features in order to better satisfy user 
needs. Moreover, it is possible to choose which kind of 
"labels" should be visualized, amongst the "node-id", 
the picture (if available), or both. Even if the case study 
presented in this paper is inspired by a real investiga- 
tion, for privacy reasons in the figures displayed in this 
work the pictures are fictitious. In the real investiga- 
tions, these pictures represent the mugshots of suspects 
(for those who are available). 

5.4-1 Edge Decorated 

In this section we focus the attention on a specific fea- 
ture regarding the visualization, called edge decorator. 
This technique we propose is optimal in the case both of 
phone traffic networks and, in our opinion, more widely 
in Social Network Analysis. 

In detail, this strategy that has been introduced in Lo- 
gAnalysis produces graphs not only according to the 
force-directed layout, but also by adopting different col- 
ors not only for nodes but even for edges. To this pur- 
pose, we recall that the node color is given by the clan 
each node belongs to. Instead, the edge color is calcu- 
lated by means of a weight function (in our case, the 
number of calls between a pair of nodes). Edges are an- 
notated with weights associated to both directions (in- 
and out-degree). The interval in which the weights lie is 
normalized, depending on the characteristics of the net- 
work. However, this strategy results in the adoption an 
edge color code, that in our case study has been calcu- 
lated as follows: (i) gray for weight < 10; (ii) green for 
10 < weight < 60; (iii) fuchsia for 61 < weight < 100; 
(iv) red for weight > 100. 

The main advantage of introducing color code for nodes 
and edges is the possibility of easily identifying the 
strongest relationships, among hundreds, or even thou- 
sands, nodes and edges. During the real investigation, 
this feature has been proved to be helpful in order to 
give to the analyst a clear picture of the intensity of the 
communications among different actors of the network, 
with the only effort to give a overall glance on the net- 
work itself. Finally, the possibility of visually putting 
into evidence those communications paths that occur 




Fig. 6 Example of edge decorated. This feature classifies the 
edges with respect to a weight function and adopts different 
colors according to these weights. It is fundamental in order 
to give to the analyst an immediate picture that summarizes 
the intensity of the communications through the network. 

more frequently with respect to the average is instru- 
mental because it allows to highlight in a visual way 
those information provided by the overall metric tool. 

5-4-2 Shortest Path Finder 

Another useful visualization tool provided by LogAnal- 
ysis is the shortest path finder. The usage of the shortest 
path finder is crucial to highlight those paths that are 
optimal in order to spread information through the net- 
work. In detail, the tool is useful to highlight nodes and 
edges involved in the shortest path between any given 
pair of nodes of the network. This representation allows 
to highlight relationships among individuals belonging 
to distant groups in the graph. In Figure [7] the usage 
of the tool is presented. In this specific case, the ana- 
lyst was interested in understanding the most efficient 
way of communication that intervenes between nodes 
289 and 379, two possible suspects. Even though these 
nodes appears to be distant, it exists in the consid- 
ered network a relatively short path, constituted only 
by 4 hops that connects these suspects. Another essen- 
tial information that it is possible to put into evidence 
by using this tool is that, usually, information can effi- 
ciently flow through those nodes that are more central 
in their respective clans, and that there exist usually a 
small number of referents that vehiculate the most of 
the communications. 



Forensic Analysis of Phone Call Networks 



15 



File Help 

& Radial View 



M 5 



1 Info Details Nodes Edges ' 



Graph 

diameter. S 
density 0.01 
Nodes 
Edges 



Settings 

NBodyForce 
GraviiaiionalC. 
Distance 
BarnesHut 1'heta 



DragForce 
DragCoefficient 



Spring F orce 
SpringCoeffic. 
Default Spring. 



Connectivity Filter 
Distance 



btw centrality factor 
btw scaling <^ 

Choose a label 

Label: | phone 



=<> 3.899 



-Q 20 



' Farce directed Shortest path 1 




Fig. 7 The Shortest Path Viewer. It is fundamental when the analyst would like to understand the shortest ways of commu- 
nications in the network. Usually, criminal organizations are structured in order to optimize the number of communications 
among members to efficiently disseminate information. This is possible by following short paths of communications that can 
be discovered by using this tool. 



5.5 Time Filtering 



A powerful filter included in LogAnalysis, which de- 
serves a specific explanation, is the time filter. Start- 
ing from the assumption that phone call networks are 
time-dependent, and the structure of the network could 
change accordingly, we introduced in our tool the pos- 
sibility of "filtering" the structure of the network with 
respect to specific temporal constraints. As shown in 
Figure [5J it is possible to select a time interval, by us- 
ing a slider which comprises the whole temporal range 
covered by the log file. The structure of the network is 
filtered accordingly, removing all the edges represent- 
ing connections (i.e., phone calls) which did not take 
place in that specific time window, and insulating (or 
hiding) those nodes not involved in the network at that 
given time. Additionally, if the user modifies the time 
interval, nodes involved are automatically "engaged" or 
detached and, thanks to the force-directed algorithm, 
are attracted or rejected inside/outside the network. 
The time filter is a feature that has been proved to be 
incredibly powerful. Its adoption helps the analyst in 
identifying those communications that happened in a 



specific time window (say, for example, a day) and the 
structure of the graph during the given interval. Such 
a possibility heightens the capability of the detective 
to understand the structure of a criminal organization 
and its evolution over time. In fact, because the con- 
nections may spread during a long time interval, it is 
fundamental for the investigator to understand at what 
time the given graph was already reflecting, for exam- 
ple, the structure of a clan or the presence of a partic- 
ular referent in the network. Similarly, the possibility 
of dynamically visualize the effect of engaging or de- 
taching nodes according to the modification of the time 
filter is crucial in order to highlight those nodes that are 
involved, during a specific time window, in the phone 
traffic network. 



5.6 Time Flow Analyzer 

The last visual tool which has been included in Log- 
Analysis is related to the time filtering features previ- 
ously presented, but it is also detached from the rep- 
resentation by means of a graph of the phone traffic 
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Fig. 8 The time filter feature. This tool is able to improve the capabilities of an analyst because it allows to specify a particular 
time window and to investigate how the structure of the network changes accordingly. Nodes are dynamically engaged or 
detached according to the time information about the phone call, dynamically altering the structure of the network. 



networks. In fact, the Time Flow Analyzer (see Figure 
[9]) considers each single phone call as an event, graphi- 
cally represented in a time-line which covers a specific, 
user-defined, interval of time. The advantage of a time- 
dependent visualization is crucial in the scenario of the 
forensic investigations. In fact, it allows to organize in- 
formation and event-flows in a visual manner in order 
to put into evidence the degree of correlation of specific 
events (in our case the phone connections). 

In the Time Flow Analyzer we included in LogAnaly- 
sis, the visual representation of a bi-dimensional space 
presents the days on the x-axis and the hours on the 
y-axis. Each event is presented by a colored square, 
whose color depends on the type of communication rep- 
resented (i.e., sent /received calls and SMS and other 
type of communications, etc.). It is possible to apply 
several filters, in order to select only specific events: 

All All the phone events; 

1-2 Sent/received calls; 

6-7 Sent/received SMS; 

All the other type of communications. 



Moreover, it is possible to zoom in/out the time interval 
in order to obtain additional insights about connections 
of events. Finally, the Time Flow Analyzer allows the 
analysts to query the data in order to retrieve informa- 
tion about specific events or even about specific phone 
numbers, etc. The adoption of this tool during real in- 
vestigations is crucial to identify single events that set 
off to cascades of related events. In detail, the time- 
dependent visualization allows the analyst to highlight 
those communications that triggered, in cascade, addi- 
tional communications to other actors. For example, it 
is possible to specify small time windows that may co- 
incide to specific criminal events, in order to emphasize 
those phone connections that happened during the that 
interval and the involved actors, with an heightened 
probability of finding additional suspects or individuals 
involved in the criminal organization. 

The aspect of temporal analysis in the context of phone 
call investigations has an extreme relevance. The Time 
Flow Analyzer feature of LogAnalysis allows to foren- 
sic analysts to highlight those fundamental communi- 
cations that happened in critical periods of interest for 
a given investigation. For example, from Figure [9] it is 
possible to put into evidence that an important amount 
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Fig. 9 The Time Flow Analyzer tool. This tool is helpful to 
consider the time-dependence of events (i.e., phone calls) in 
a specific time window and it is crucial in order to highlight 
phone call cascades during criminal events. 



of phone calls happened before, during and after the 
commission of a serious crime, among those components 
of the criminal organizations highlighted by means of 
the network structure of the phone calls. The temporal 
analysis, although not directly represented by means of 
networks, is closely interconnected to the structure and 
the evolution of the phone call network itself, and the 
Time Flow Analyzer tool is instrumental to highlight 
and understand this critical dependency. 



5.7 Stacked Histograms 

The last tool of LogAnalysis described in this work 
is called Stacked Histograms. This tool empowers the 
temporal analysis features provided by LogAnalysis and 



it is shown in Figure [TOj Its functioning is explained as 
follows. Similarly to the Time Flow Analyzer tool, in 
the Stacked Histograms on the x-axis it is represented 
the time flow, but on the y-axis there is the amount of 
phone calls in the given interval. In the Stacked His- 
tograms, each actor has assigned a stack, whose color 
and intensity is proportional to the number of phone 
calls related to the given individual, during the spe- 
cific period of interest taken in consideration from the 
forensic analyst. In detail, the intensity of the color with 
which the stack histograms are represented is related to 
the absolute number of phone calls (in-coming and out- 
coming contacts) of each actor, while the thickness of 
the histogram may represent the in-degree or the out- 
degree of the given user at that day (highlighting those 
actors who are more popular and those who are more 
gregarious). The Stacked Histograms tool is helpful to 




Fig. 10 The Stacked Histogram Tool is helpful to visually 
summarize the communications among actors elapsed in a 
given time interval. 



get a picture of the phone call activity of the set of con- 
sidered actors elapsed during a specific time window. 
Finally, it is particularly instrumental to understand in 
which proportion the phone activity of a given actor is 
with respect to the other individuals in its network who 
are in contact with him/her (i.e., its ego-network), in 
that specific time period. 



6 Conclusions 

The analysis of networks of phone traffic for investiga- 
tive and forensic activities, aimed at discovering the re- 
lational dynamics among individuals belonging to crim- 
inal associations is a hard task. Our goal was to develop 
a systematical model of analysis oriented to simplify 
exploration of networks whose elements are large col- 
lections of mobile phone traffic data. Our approach is 
based on Social Network Analysis studies, which devel- 
oped useful techniques to tackle the problem. Neverthe- 
less, few useful tools hitherto support this type of net- 
work analysis. The tool we developed, LogAnalysis, sup- 
ports the exploration of networks representing mobile 
phone traffic networks. It employs visual and statistical 
features in order to help in discovering cohesive groups, 
key figures and individuals acting as link. LogAnaly- 
sis helps in systematically and flexibly obtaining mea- 
sures typical of SNA in order to find outlier/anomalous 
values. Users can interactively identify sub-groups and 
focus on interesting actors of the network. In addition, 
the tool includes the possibility of exploring the tempo- 
ral evolution of the network structure and the temporal 
information flow. 

Future improvements to LogAnalysis will concern the 
geo-spatial analysis of phone traffic networks and the 
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implementation of novel measures of centrality 0H], 
community detection algorithms and graph visualiza- 
tion techniques. 
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